SE Minneapolis , MN 55455 - 0159 USA TR 08 - 022 Bayesian Co - clustering

نویسندگان

  • Arindam Banerjee
  • Hanhuai Shan
چکیده

In recent years, co-clustering has emerged as a powerful data mining tool that can analyze dyadic data connecting two entities. However, almost all existing co-clustering techniques are partitional, and allow individual rows and columns of a data matrix to belong to only one cluster. Several current applications, such as recommendation systems and market basket analysis, can substantially benefit from a mixed membership of rows and columns. In this paper, we present Bayesian co-clustering (BCC) models, that allow a mixed membership in row and column clusters. BCC maintains separate Dirichlet priors for rows and columns over the mixed membership and assumes each observation to be generated by an exponential family distribution corresponding to its row and column clusters. We propose a fast variational algorithm for inference and parameter estimation. The model is designed to naturally handle sparse matrices as the inference is done only based on the non-missing entries. In addition to finding co-cluster structure in observations, the model outputs a low dimensional co-embedding, and accurately predicts missing values in the original matrix. We demonstrate the efficacy of the model through experiments on both simulated and real data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SE Minneapolis , MN 55455 - 0159 USA TR 08 - 042 Infobionics Server - the next generation database

This paper describes the ‘Infobionics Server’ a next generation database. Also referred to as the ‘Cellular Database Server’, that is based on a novel ‘cellular’ data model.

متن کامل

Smaller is tougher

Smaller is tougher A.R. Beaber a , J.D. Nowak b , O. Ugurlu c , W.M. Mook d , S.L. Girshick e , R. Ballarini f & W.W. Gerberich a a Department of Chemical Engineering and Materials Science, University of Minnesota, 421 Washington Ave SE, Minneapolis, MN 55455, USA b Hysitron Incorporated, 10025 Valley View Road, Minneapolis, Minnesota 55344, USA c Characterization Facility, University of Minnes...

متن کامل

Small size strength dependence on dislocation nucleation

J.D. Nowak, A.R. Beaber, O. Ugurlu, S.L. Girshick and W.W. Gerberich* Hysitron Incorporated, 10025 Valley View Road, Minneapolis, MN 55344, USA Department of Chemical Engineering and Materials Science, University of Minnesota, 421 Washington Ave SE, Minneapolis, MN 55455, USA Characterization Facility, University of Minnesota, Minneapolis, MN 55455, USA Department of Mechanical Engineering, Uni...

متن کامل

Department of Computer Science and Engineering University of Minnesota 4 - 192 EECS Building 200 Union Street SE Minneapolis , MN 55455 - 0159 USA TR 04 - 021 gCLUTO – An Interactive Clustering , Visualization , and Analysis System

Recently published studies have shown that partitional clustering algorithms that optimize certain criterion functions, which measure key aspects of interand intra-cluster similarity, are very effective in producing hard clustering solutions for document datasets and outperform traditional partitional and agglomerative algorithms. In this paper we study the extent to which these criterion funct...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008